231 research outputs found

    Probabilistic Model Counting with Short XORs

    Full text link
    The idea of counting the number of satisfying truth assignments (models) of a formula by adding random parity constraints can be traced back to the seminal work of Valiant and Vazirani, showing that NP is as easy as detecting unique solutions. While theoretically sound, the random parity constraints in that construction have the following drawback: each constraint, on average, involves half of all variables. As a result, the branching factor associated with searching for models that also satisfy the parity constraints quickly gets out of hand. In this work we prove that one can work with much shorter parity constraints and still get rigorous mathematical guarantees, especially when the number of models is large so that many constraints need to be added. Our work is based on the realization that the essential feature for random systems of parity constraints to be useful in probabilistic model counting is that the geometry of their set of solutions resembles an error-correcting code.Comment: To appear in SAT 1

    Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access

    Full text link
    Pregel is a popular distributed computing model for dealing with large-scale graphs. However, it can be tricky to implement graph algorithms correctly and efficiently in Pregel's vertex-centric model, especially when the algorithm has multiple computation stages, complicated data dependencies, or even communication over dynamic internal data structures. Some domain-specific languages (DSLs) have been proposed to provide more intuitive ways to implement graph algorithms, but due to the lack of support for remote access --- reading or writing attributes of other vertices through references --- they cannot handle the above mentioned dynamic communication, causing a class of Pregel algorithms with fast convergence impossible to implement. To address this problem, we design and implement Palgol, a more declarative and powerful DSL which supports remote access. In particular, programmers can use a more declarative syntax called chain access to naturally specify dynamic communication as if directly reading data on arbitrary remote vertices. By analyzing the logic patterns of chain access, we provide a novel algorithm for compiling Palgol programs to efficient Pregel code. We demonstrate the power of Palgol by using it to implement several practical Pregel algorithms, and the evaluation result shows that the efficiency of Palgol is comparable with that of hand-written code.Comment: 12 pages, 10 figures, extended version of APLAS 2017 pape

    Asynchronous Graph Pattern Matching on Multiprocessor Systems

    Full text link
    Pattern matching on large graphs is the foundation for a variety of application domains. Strict latency requirements and continuously increasing graph sizes demand the usage of highly parallel in-memory graph processing engines that need to consider non-uniform memory access (NUMA) and concurrency issues to scale up on modern multiprocessor systems. To tackle these aspects, graph partitioning becomes increasingly important. Hence, we present a technique to process graph pattern matching on NUMA systems in this paper. As a scalable pattern matching processing infrastructure, we leverage a data-oriented architecture that preserves data locality and minimizes concurrency-related bottlenecks on NUMA systems. We show in detail, how graph pattern matching can be asynchronously processed on a multiprocessor system.Comment: 14 Pages, Extended version for ADBIS 201

    Timing properties and correctness for structured parallel programs on x86-64 multicores

    Get PDF
    This paper determines correctness and timing properties for structured parallel programs on x86-64 multicores. Multicore architectures are increasingly common, but real architectures have unpredictable timing properties, and even correctness is not obvious above the relaxed-memory concurrency models that are enforced by commonly-used hardware. This paper takes a rigorous approach to correctness and timing properties, examining common locking protocols from first principles, and extending this through queues to structured parallel constructs. We prove functional correctness and derive simple timing models, and both extend for the first time from low-level primitives to high-level parallel patterns. Our derived high-level timing models for structured parallel programs allow us to accurately predict upper bounds on program execution times on x86-64 multicores.Postprin

    On the Usability of Probably Approximately Correct Implication Bases

    Full text link
    We revisit the notion of probably approximately correct implication bases from the literature and present a first formulation in the language of formal concept analysis, with the goal to investigate whether such bases represent a suitable substitute for exact implication bases in practical use-cases. To this end, we quantitatively examine the behavior of probably approximately correct implication bases on artificial and real-world data sets and compare their precision and recall with respect to their corresponding exact implication bases. Using a small example, we also provide qualitative insight that implications from probably approximately correct bases can still represent meaningful knowledge from a given data set.Comment: 17 pages, 8 figures; typos added, corrected x-label on graph

    On the Computational Complexity of MapReduce

    Full text link
    In this paper we study MapReduce computations from a complexity-theoretic perspective. First, we formulate a uniform version of the MRC model of Karloff et al. (2010). We then show that the class of regular languages, and moreover all of sublogarithmic space, lies in constant round MRC. This result also applies to the MPC model of Andoni et al. (2014). In addition, we prove that, conditioned on a variant of the Exponential Time Hypothesis, there are strict hierarchies within MRC so that increasing the number of rounds or the amount of time per processor increases the power of MRC. To the best of our knowledge we are the first to approach the MapReduce model with complexity-theoretic techniques, and our work lays the foundation for further analysis relating MapReduce to established complexity classes

    The Range of Topological Effects on Communication

    Full text link
    We continue the study of communication cost of computing functions when inputs are distributed among kk processors, each of which is located at one vertex of a network/graph called a terminal. Every other node of the network also has a processor, with no input. The communication is point-to-point and the cost is the total number of bits exchanged by the protocol, in the worst case, on all edges. Chattopadhyay, Radhakrishnan and Rudra (FOCS'14) recently initiated a study of the effect of topology of the network on the total communication cost using tools from L1L_1 embeddings. Their techniques provided tight bounds for simple functions like Element-Distinctness (ED), which depend on the 1-median of the graph. This work addresses two other kinds of natural functions. We show that for a large class of natural functions like Set-Disjointness the communication cost is essentially nn times the cost of the optimal Steiner tree connecting the terminals. Further, we show for natural composed functions like EDXOR\text{ED} \circ \text{XOR} and XORED\text{XOR} \circ \text{ED}, the naive protocols suggested by their definition is optimal for general networks. Interestingly, the bounds for these functions depend on more involved topological parameters that are a combination of Steiner tree and 1-median costs. To obtain our results, we use some new tools in addition to ones used in Chattopadhyay et. al. These include (i) viewing the communication constraints via a linear program; (ii) using tools from the theory of tree embeddings to prove topology sensitive direct sum results that handle the case of composed functions and (iii) representing the communication constraints of certain problems as a family of collection of multiway cuts, where each multiway cut simulates the hardness of computing the function on the star topology

    Stochastic tasks: difficulty and Levin search

    Full text link
    We establish a setting for asynchronous stochastic tasks that account for episodes, rewards and responses, and, most especially, the computational complexity of the algorithm behind an agent solving a task. This is used to determine the difficulty of a task as the (logarithm of the) number of computational steps required to acquire an acceptable policy for the task, which includes the exploration of policies and their verification. We also analyse instance difficulty, task compositions and decompositions.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN 2010-21062-C02-02, PCIN-2013-037 and TIN 2013-45732-C4-1-P, and by Generalitat Valenciana PROMETEOII 2015/013.Hernández Orallo, J. (2015). Stochastic tasks: difficulty and Levin search. En Artificial General Intelligence. Springer International Publishing. 90-100. http://hdl.handle.net/10251/66686S9010

    Learning Ordinal Preferences on Multiattribute Domains: the Case of CP-nets

    Get PDF
    International audienceA recurrent issue in decision making is to extract a preference structure by observing the user's behavior in different situations. In this paper, we investigate the problem of learning ordinal preference orderings over discrete multi-attribute, or combinatorial, domains. Specifically, we focus on the learnability issue of conditional preference networks, or CP- nets, that have recently emerged as a popular graphical language for representing ordinal preferences in a concise and intuitive manner. This paper provides results in both passive and active learning. In the passive setting, the learner aims at finding a CP-net compatible with a supplied set of examples, while in the active setting the learner searches for the cheapest interaction policy with the user for acquiring the target CP-net

    Risk-Averse Matchings over Uncertain Graph Databases

    Full text link
    A large number of applications such as querying sensor networks, and analyzing protein-protein interaction (PPI) networks, rely on mining uncertain graph and hypergraph databases. In this work we study the following problem: given an uncertain, weighted (hyper)graph, how can we efficiently find a (hyper)matching with high expected reward, and low risk? This problem naturally arises in the context of several important applications, such as online dating, kidney exchanges, and team formation. We introduce a novel formulation for finding matchings with maximum expected reward and bounded risk under a general model of uncertain weighted (hyper)graphs that we introduce in this work. Our model generalizes probabilistic models used in prior work, and captures both continuous and discrete probability distributions, thus allowing to handle privacy related applications that inject appropriately distributed noise to (hyper)edge weights. Given that our optimization problem is NP-hard, we turn our attention to designing efficient approximation algorithms. For the case of uncertain weighted graphs, we provide a 13\frac{1}{3}-approximation algorithm, and a 15\frac{1}{5}-approximation algorithm with near optimal run time. For the case of uncertain weighted hypergraphs, we provide a Ω(1k)\Omega(\frac{1}{k})-approximation algorithm, where kk is the rank of the hypergraph (i.e., any hyperedge includes at most kk nodes), that runs in almost (modulo log factors) linear time. We complement our theoretical results by testing our approximation algorithms on a wide variety of synthetic experiments, where we observe in a controlled setting interesting findings on the trade-off between reward, and risk. We also provide an application of our formulation for providing recommendations of teams that are likely to collaborate, and have high impact.Comment: 25 page
    corecore